Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

PCNN: Projection Convolutional Neural Networks

FIGURE 3.14

We visualize the distribution of kernel weights of the ﬁrst convolution layer of PCNN-22.

The variance increases when the ratio decreases λ, which balances projection loss and cross-

entropy loss. In particular, when λ = 0 (no projection loss), only one group is obtained,

where the kernel weights are distributed around 0, which could result in instability during

binarization. In contrast, two Gaussians (with projection loss, λ > 0) are more powerful

than the single one (without projection loss), which thus results in better BNNs, as also

validated in Table 3.2.

curves) converge faster than PCNNs with λ = 0 (yellow curves) when the epoch number

> 150.

Diversity visualization In Fig. 3.17, we visualize four channels of the binary kernels D^l

in the ﬁrst row, the feature maps produced by D^l

i ^{in the second row, and the corresponding}

feature maps after binarization in the third row when J=4. This way helps illustrate the

diversity of kernels and feature maps in PCNNs. Thus, multiple projection functions can

capture diverse information and perform highly based on compressed models.

FIGURE 3.15

With λ ﬁxed to 1e −4, the variance of the kernel weights decreases from the 2nd epoch to

the 200th epoch, which conﬁrms that the projection loss does not aﬀect the convergence.